Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12346] [ML] Missing attribute names in GLM for vector-type features #10323

Closed
wants to merge 2 commits into from

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Dec 16, 2015

Currently summary() fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing VectorAssembler to make up suitable names when inputs are missing names.

cc @mengxr

test("vector attribute generation") {
val formula = new RFormula().setFormula("id ~ vec")
val original = sqlContext.createDataFrame(
Seq((1, Vectors.dense(0.0, 1.0)), (2, Vectors.dense(1.0, 2.0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support term in R formula is type vector? I think it's illegal in R.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense when using RFormula in a ML pipeline (not necessarily in R).

@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47806 has finished for PR 10323 at commit 1c66cdd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@thunterdb
Copy link
Contributor

@ericl this looks great, thanks!

asfgit pushed a commit that referenced this pull request Jan 18, 2016
…ures

Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names.

cc mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #10323 from ericl/spark-12346.

(cherry picked from commit 5e492e9)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in 5e492e9 Jan 18, 2016
@mengxr
Copy link
Contributor

mengxr commented Jan 18, 2016

Merged into master and branch-1.6. Thanks! I created https://issues.apache.org/jira/browse/SPARK-12886 to track some follow-up tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants